1
From Probability to Likelihood: The Science of Inference
MATH003 Lesson 6
00:00
Statistical inference marks the transition from predicting outcomes based on known parameters (Probability) to determining which parameters are most consistent with observed data (Likelihood). While a probability density function $f(x|\theta)$ describes the distribution of data $x$ for a fixed $\theta$, the Likelihood function $L(\theta|x)$ treats the observed data as fixed and varies the parameter $\theta$ to quantify the relative support for different hypotheses.

The Inversion Principle

The likelihood function is often expressed in the form of the joint density. For a Normal distribution with fixed variance, the likelihood is defined by:

$L ( \theta | x_1, \dots, x_n ) = \exp\left( -\frac{n}{2\sigma_0^2} (\bar{x} - \theta)^2 \right)$

Here, we evaluate the "plausibility" of different $\theta$ values given the sample mean $\bar{x}$. To find the peak of this plausibility, we utilize Definition 6.2.2: the log-likelihood $l(\theta | s) = \ln L(\theta | s)$. This transformation simplifies products of independent observations into sums, making the maximization of complex models computationally feasible.

Worked Example: The Height Survey (EXAMPLE 6.3.5)

The Data

Consider a sample of $n=30$ heights with a calculated standard deviation of $s=2.379$. Using the Location-Scale Normal Model, we seek to infer the true mean $\theta$.

Inference & Precision

The standard error is calculated as $s/\sqrt{30} = 0.43434$. This value measures the "sharpness" of our likelihood peak. A smaller standard error implies a narrower, sharper peak, representing higher precision in our inference about $\theta$.

Dimensionality and Constraints

In complex scenarios like EXAMPLE 6.1.5 (Multinomial Models), we must account for logical dependencies. As noted, "Notice that it is really only two-dimensional, because as soon as we know the value of any two of the $\theta_i$'s... we immediately know the value of the remaining parameter." This constraint is vital for correctly defining the parameter space $\Omega$.

Asymptotic Foundations

The bridge from likelihood to inference relies on the Central Limit Theorem. As $n \to \infty$, the distribution of our estimators converges. Specifically, in the EXAMPLE 6.5.4 Bernoulli Model:

$Z = \frac{\sqrt{n}(\bar{X} - \theta)}{\sqrt{\bar{X}(1 - \bar{X})}} \xrightarrow{D} N(0, 1)$

This allows us to quantify uncertainty using z-intervals and p-values, provided we have sufficiently large samples.

🎯 Core Principle
Distribution-free methods of statistical inference require only minimal assumptions about the sampling distribution, making them robust when the family $\{P_{\theta} : \theta \in \Omega\}$ is very large. In contrast, parametric likelihood methods rely on the curvature of the log-likelihood, where the Fisher Information $nI(\theta)$ determines the variance of our score function.